Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

نویسندگان

  • Young-Bum Kim
  • Benjamin Snyder
چکیده

In this paper, we present a solution to one aspect of the decipherment task: the prediction of consonants and vowels for an unknown language and alphabet. Adopting a classical Bayesian perspective, we performs posterior inference over hundreds of languages, leveraging knowledge of known languages and alphabets to uncover general linguistic patterns of typologically coherent language clusters. We achieve average accuracy in the unsupervised consonant/vowel prediction task of 99% across 503 languages. We further show that our methodology can be used to predict more fine-grained phonetic distinctions. On a three-way classification task between vowels, nasals, and nonnasal consonants, our model yields unsupervised accuracy of 89% across the same set of languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

پیش‌بینی قابلیت فهم همخوان‌ها در افراد دارای شنوایی عادی با استفاده از مدل‌های میکروسکوپی دارای معیار فاصله‌ مختلف در بازشناساگر خودکار گفتار

In this study, recognition rates of consonants available in vowel-consonant-vowel structure in hearing tests and two microscopic models will be investigated. Such a syllable structure doesn’t exist in Farsi and Azerbaijani languages, but since the goal is only recognition of middle phoneme, according to hearing tests, listeners are able to properly recognize phonemes in clean speech conditions....

متن کامل

Assimilation of Final Low Back Vowel in Eghlidian Dialect

In this article, the low back vowel /A/ in word-final positions in Eghlidian dialect, one of Persian dialects, is studied. This vowel is represented phonetically as [A], [o] and [@] in different phonetic environments. Therefore many words were collected via interviewing ten native speakers so that these different alternant forms can be accounted for appropriately. Since one of the authors of th...

متن کامل

Study on the Anticipatory Coariticulatory Effect of Chinese Disyllabic Sequences

In this study, the Vowel-to-Vowel (V-to-V) coarticulatory effect in the Vowel-Consonant-Vowel (VCV) sequences is investigated, and the F2 offset value of the first vowel is analyzed. Results show that, in the trans-segment context, anticipatory coarticulation exists in Chinese. Due to high articulatory strength of aspirated obstruents, in the context of subsequent vowel /i/, the V1 F2 offset va...

متن کامل

Recognition of Tamil Syllables Using Vowel Onset Points with Production, Perception Based Features

Tamil Language is one of the ancient Dravidian languages spoken in south India. Most of the Indian languages are syllabic in nature and syllables are in the form of Consonant-Vowel (CV) units. In Tamil language, CV pattern occurs in the beginning, middle and end of a word. In this work, CV Units formed with Stop Consonant – Short Vowel (SCSV) were considered for classification task. The work ca...

متن کامل

Issues of phonological complexity: Statistical analysis of the relationship between syllable structures, segment inventories and tone contrasts

It is often suggested that languages are likely to ‘compensate’ complexity in one subsystem by simplicity elsewhere. In this paper evidence against this idea is presented by examining several subsystems of the basic phonology in a set of over 600 languages selected to represent genetic and areal diversity. The relationships between elaboration of the syllable canon, the size of segment inventor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013